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Abstract. Sequential and Quantum Monte Carlo methods, as well as genetic 
type search algorithms can be interpreted as a mean field and interacting particle 
approximations of Feynman-Kac models in distribution spaces. The performance 
of these population Monte Carlo algorithms is strongly related to the stability 
properties of nonlinear Feynman-Kac semigroups. In this paper, we analyze these 
models in terms of Dobrushin ergodic coefficients of the reference Markov tran- 
psj 1 sitions and the oscillations of the potential functions. Sufficient conditions for 

■ uniform concentration inequalities w.r.t. time are expressed explicitly in terms of 
these two quantities. We provide an original perturbation analysis that applies 

■ to annealed and adaptive FK models, yielding what seems to be the first results 
of this kind for these type of models. Special attention is devoted to the par- 

■ ticular case of Boltzmann-Gibbs measures' sampling. In this context, we design 
an explicit way of tuning the number of Markov Chain Monte Carlo iterations 
with temperature schedule. We also propose and analyze an alternative inter- 
acting particle method based on an adaptive strategy to define the temperature 
increments. 
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Introduction 



Feynman-Kac (abbreviate FK) particle methods, also called sequential, quantum 
or diffusion Monte Carlo methods, are stochastic algorithms to sample from a se- 
\ quence of complex high-dimensional probability distributions. These stochastic sim- 

; ulation techniques are of current use in numerical physics [TJ [2J [32] to compute 

ground state energies in molecular systems. They are also used in statistics, signal 
processing and information sciences j6j [151 HH ED] to compute posterior distribu- 
{Sj \ tions of a partially observed signal or unknown parameters. In the evolutionary 

computing literature, these Monte Carlo methods are used as natural population 
search algorithms for solving optimization problems. From the pure mathematical 
viewpoint, these advanced Monte Carlo methods are an interacting particle system 
(abbreviate IPS) interpretation of FK models. For a more thorough discussion on 
these models, we refer the reader to the monograph [14] , and the references therein. 
The principle (see also [18j and the references therein) is to approximate a sequence 
of target probability distributions (rj n ) n by a large cloud of random samples termed 
particles or walkers. The algorithm starts with N independent samples from r)o and 
then alternates two types of steps: an acceptance-rejection scheme equipped with a 
selection type recycling mechanism, and a sequence of free exploration of the state 
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space. 

In the recycling stage, the current cloud of particles is transformed by randomly 
duplicating and eliminating particles in a suitable way, similarly to a selection step 
in models of population genetics. In the Markov evolution step, particles move 
independently one each other (mutation step). 

This method is often used for solving sequential problems, such as filtering (see 
e.g., |BJ I2EJ US])- In other interesting problems, these algorithms also turn out to be 
efficient to sample from a single target measure rj. In this context, the central idea 
is to find a judicious interpolating sequence of measures (i]k)o<k<n with increasing 
sampling complexity, starting from some initial distribution rjo, up to the terminal 
one f] n = 77. Consecutive measures rjk and r/^+i are sufficiently similar to allow for 
efficient importance sampling and/or acceptance-rejection sampling. The sequential 
aspect of the approach is then an "artificial way" to introduce the difficulty of 
sampling gradually. In this vein, important examples are provided by annealed 
models. More generally, a crucial point is that large population sizes allow to cover 
several modes simultaneously. This is an advantage compared to standard MCMC 
methods that are more likely to be trapped in local modes. These sequential samplers 
have been used with success in several application domains, including rare events 
simulation (see [10J), stochastic optimization and more generally Boltzmann-Gibbs 
measures sampling (|18|). 

Up to now, IPS algorithms have been mostly analyzed using asymptotic (i.e. when 
number of particles iV tends to infinity) techniques, notably through fluctuation the- 
orems and large deviation principles (see for instance [16| |2T], |20 [ |23 | |25].|34|. |12| . 
|15| . [B] and [13] for an overview). 

Some non- asymptotic theorems have been recently developped ([HI HZ]), but unfor- 
tunately none of them apply to annalyze annealed and adaptive FK particle models. 
On the other hand, these type of nonhomogeneous IPS algorithms are of current use 
for solving concrete problems arising in numerical physics and engineering sciences 
(see for instance [5] [301 EI], P21 E7J |36j, [33, 38J). By the lack of non-asymptotic 
estimates, these particle algorithms are used as natural heuristics. 
The main contribution of this article is to analyze these two classes of time non- 
homogeneous IPS models. Our approach is based on semigroup techniques and on 
an original perturbation analysis to derive several uniform estimates w.r.t. the time 
parameter. 

More precisely, in the case of annealed type models, we estimate explicitly the 
stability properties of FK semigroup in terms of the Dobrushin ergodic coefficient 
of the reference Markov chain and the oscillations of the potential functions. We 
combine these techniques with non-asymptotic theorems on L p -mean error bounds 
Q25J) and some useful concentration inequalities ( |22[ [26]). Then, we provide pa- 
rameter tuning strategies that allow to deduce some useful uniform concentration 
inequalities w.r.t. the time parameter. These results apply to non homogeneous 
FK models associated with cooling temperature parameters. In this situation, the 
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sequence of measures rj n is associated with a nonincreasing temperature parame- 
ter. We mention that other independent approaches, such as Whiteley's Q40J) or 
Schweizer's (|39]), are based on, e.g., drift conditions, hyper-boundedness, spectral 
gaps, or non-asymptotic biais and variance decompositions. These approaches lead 
to convergence results that may also apply to non-compact state spaces. To our 
knowledge, these techniques are restricted to non-asymptotic variance theorems and 
they cannot be used to derive uniform and exponential concentration inequalities. 
It seems also difficult to extend these approaches to analyze the adaptive IPS model 
discussed in the present article. To solve these questions, we develop a perturbation 
technique of stochastic FK semigroups. In contrast to traditional FK semigroup, the 
adaptive particle scheme is now based on random potential functions that depend 
on a cooling schedule adapted to the variability and the adaptation of the random 
populations. 



The rest of the article is organized as follows. In a preliminary section, we recall 
a few essential notions related to Dobrushin coefficients or FK semigroups. We also 
provide some important non-asymptotic results we use in the further development 
of the article. Section [2] is concerned with the semigroup stability analysis of these 
models. We also provide a couple of uniform //"-deviations and concentration esti- 
mates. In Section [3] we apply these results to Boltzmann-Gibbs models associated 
with a decreasing temperature schedule. In this context, IPS algorithm can be in- 
terpreted as a sequence of interacting simulated annealing algorithms (abbreviate 
ISA). We design an explicit way of tuning the number of Markov Chain Monte 
Carlo iterations with the temperature schedule. Finally, in Section HJ we propose 
an alternative ISA method based on an original adaptive strategy to design on the 
flow the temperature decrements. We provide a non-asymptotic study, based on a 
perturbation analysis. We end the article with //-deviation estimates as well as a 
couple of concentration inequalities. 



STATEMENT OF SOME RESULTS 

Feynman-Kac particle algorithms consist in evolving an interacting particle sys- 
tem (Cn)n — \CnT--iCn) °f s ^ ze on a given state space E. Their evolution 
is decomposed into two genetic type transitions: a selection step, associated with 
some positive potential function G n ; and a mutation step, where the selected par- 
ticles evolve randomly according to a given Markov transition M n (a more detailed 
description of these IPS algorithms is provided in Section 11.4)) . In this context, 

the occupation measures r)^ := — are iV-approximations of a sequence of 

l<i<iV 

measures r] n defined by the FK recursive formulae: 

Vn-l {G n X M n .f) 



Vn(f) 



Vn-l(G n ) 
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for all bounded measurable function / on E (a more detailed discussion on these 
evolution equations is provided in Section 11.3. ip . 

To describe with some precision the main results of the article, we consider the 
pair of parameters (g n , b n ) defined below. 

9n := sup and b n = (5{M n ) := sup \M n (x, A) - M n (y } A)\ 

ace 

The quantity (3(M n ) is called the Dobrushin ergodic coefficient of the Markov tran- 
sition M n . One of our first main results can be basically stated as follows: 

Theorem 1. We assume that 

a 

sup g p < M ana sup b p < — 

p>i P >i a + M 

for some finite constant M < oo and some a G (0,1). In this situation, for any 
n >0, N > 1, y > and f e Bi(E), the probability of the event 

\€U)-Uf)\< r -^^ 

is greater than 1 — e~ y , where r* and r\ are some constants that are explicitly defined 
in terms of (a, M). 

In Section [221 under the same assumptions of Theorem [TJ we also prove uniform 
L p -mean error bounds as well as new concentration inequalities for unnormalized 
particle models. We also extend the analysis to the situation where g n — > 1. 

We already mention that the regularity conditions on b n may appear difficult to 
check since the Markov kernels are often dictated by the application under study. 
However, we can deal with this problem as soon as we can simulate a Markov 
kernel K n such that r] n .K n = rj n . Indeed, to stabilize the system, the designer 
can "add" several MCMC evolution steps next to each M n -mutation step. From 
a more formal viewpoint, the target sequence (i] n ) n is clearly also solution of the 
FK measure- valued equations associated with the Markov kernels M' n = M n .K™ n , 
where iteration numbers m n are to be chosen loosely. This system is more stable 
since the corresponding b' n satisfy 



b' n = m'n) < KAK n ) < bn-ftK: 



m„ 

n) 



In such cases, Theorem [TJ and its extension provide sufficient conditions on the it- 
eration numbers m n to ensure the convergence and the stability properties of the 
algorithm. 

These results apply to stochastic optimization problems. Let V : E — » R be a 
bounded potential function, (3 n a sequence which tends to infinity, and m a reference 
measure on E. It is well known that the sequence of Boltzmann-Gibbs measures 

r) n (dx) oc e~ l3n - v( - x ^m(dx) 
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concentrates on V's global minima (in the sense of m-essinf(V)). In the above 
display, oc stands for the proportional sign. One central observation is that these 
measures can be interpreted as a FK flow of measures associated with potential func- 
tions G n = e-iPn-Pn-ii-V an j M ar kov kernels M n = KJ™ k ° where Kp n is a simulating 
annealing kernel (see Section 13. 2p and m n and ko are given iteration parameters. In 
the further development of this section, we let K be the proposal transition of the 
simulated annealing transition Kp. In this context, the IPS methods can be used to 
minimize V. The conditions on b n and g n can be turned into conditions on the tem- 
perature schedule j3 n and the number of MCMC iterations m n . Moreover, combining 
our results with standard concentration properties of Boltzmann-Gibbs measures, 
we derive some convergence results in terms of optimization performance. In this 
notation, our second main result is basically stated as follows. 

Theorem 2. Let us fix a G (0,1). We assume that for any x 6 E, K k °(x,-) > 
Sv(-) for some measure v on E, some 5 > and some ko > 1. We also assume 
that the temperature schedule /3 P and the iteration numbers m p satisfy the following 
conditions: 

log( eA '° 3c(v ' )+a )e osc(y) -^ 

sup Ap < A and m p > - — 

p>l ' o 

for some constant A. For all e > 0, let p!^ (e) be the proportion of particles ((^) so 
that V((n) > V min + e. Then, for any n > Q, N > 1, y > and for all e' < e, the 
probability of the event 

Nf\ ^ Q~P n ( £ ~ £> ) r \N + r* 2 y 
Pn K ' ~ m £ , N 2 

is greater than l — e~ y , with m £ i = m (V < V m i n + e'), and the same constants (r*, r%) 
as the one stated in Theorem\J\ (with M = e ). 

It is instructive to compare the estimates in the above theorem with the perfor- 
mance analysis of the traditional simulated annealing model (abbreviate SA). Firstly, 
most of the literature on SA models is concerned with the weak convergence of the 
law of the random states of the algorithm. When the initial temperature of the 
scheme is greater than some critical value, using a logarithmic cooling schedule, it 
is well known that the probability for the random state to be in the global extrema 
levels tends to 1, as the time parameter tends to oo. The cooling schedule presented 
in Theorem [2] is again a logarithmic one. In contrast to the SA model, Theorem |5] 
allows to quantify the performance analysis of the ISA model in terms of uniform 
concentration inequalities, that doesn't depend on a critical parameter. 

In practice, choosing the sequence of increments A n = (j3 n — /3 n _i) in advance can 
cause computational problems. To solve this problem, adaptive strategies, where 
increment A n depends on the current set of particles Cn-i ; are of common use in the 
engineering community (see for instance [33, 38], [T3"| [27J EE])- In this context, we 
propose to study the case where the increment is chosen so that 
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N i -A N -V\ 

where e > is a given constant (see Section 14.21 for a detailed description of the 
algorithm). Computationally speaking, e is the expectation of the proportion of 
particles which are not concerned with the recycling mechanism in the selection step. 
We interpret this particle process as a perturbation of a theoretical FK sequence 
7] n associated with a theoretical temperature schedule B n . Our main result is the 
following L p -mean error estimate. 

Theorem 3. For any p > 1, n > 0, N > 1 and any bounded by 1 function f , we 
have 

jj n n 

e(K(/)-^(/)D 1/p <^E II (W + ci)), 

with c n = max tttt) A n = B n — B n _i and B v defined below. 

e-r) n -i{V) 

,0,) B?= M! ; BlH\- ^ 



2P.pl ' zp+i 2P.p\^2p~TT 

Under appropriate regularity conditions on the parameters b n ,g n ,c n , we mention 
that these L p -mean error bounds also provide uniform concentration inequalities. 
The proofs of Theorem [TJ Theorem HI Theorem [3] and related uniform exponential 
estimates are detailed, respectively, in Sections 12.21 Section and Section H~4"l 



1. Some Preliminaries 

1.1. Basic Notation. Let (E,r) be a complete, separable metric space and let £ 
be the cx-algebra of Borel subsets of E. Denote by V(E) the space of probability 
measures on E. Let B(E) be the space of bounded, measurable, real-valued func- 
tions on E. Let B\(E) C B(E) be the subset of all bounded by 1 functions. 

If fx G V(E), f E B(E) and K,K 1 ,K 2 are Markov kernels on E, then fi(f) de- 
notes the quantity f E f(x)fi(dx), K\.Ki denotes the Markov kernel defined by 

(K 1 .K 2 )(x,A)= [ K x {x,dy)K*{v,A), 

J E 

K.f denotes the function defined by 

K.f(x)= [ K(x,dy)f(y) 

J E 

and fi.K denotes the probability measure defined by 

fi.K(A) = / K(x,A)fi(dx). 

J E 
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If G is a positive, bounded function on E, then i/)q : V(E) — > V(E) denotes the 
Boltzmann-Gibbs transformation associated with G, defined by 

V»eV(E),VfeB(E), M^)(f) = ^j^f-- 
For any / G 13(E), let WfW^ = sup|/(x)| and osc(f) = (/ max - / min ). Let O x (E) C 

x&E 

B(E) be the subset of functions / so that osc(f) < 1. For any random variable 
X : Q — » R defined on some probability space (Q, J 7 , P), and any p > 1, ||X|| p 
stands for the L p norm E(|X| p ) ly ' p . Let Vq(E) be the set of random probability 
measures on E. For all p > 1, we denote by <i p the distance on Vn{E) defined for 
all random measures fi, v by 

d p (jl,i>)= sup -#(/)IIp- 

/eOi(B) 

Finally, for any x G -E, <5 X stands for the Dirac measure centered on x. 

1.2. Dobrushin Ergodic Coefficient. Let us recall here the definitions as well as 
some simple properties that will be useful in the following. 

Definition 4. Let fj,, v G V(E), the total variation distance between \i and v is 
defined by 

- v\\ tv = sup{|//(A) - u{A)\ ;AeS}. 

Definition 5. To each Markov kernel K on E, is associated its Dobrushin ergodic 
coefficient (3(K) G [0, 1] defined by 

f3(K) = sup{K(x } A) - K(y, A); x,y G E, A G £} 
or in an equivalent way: 

0(K) = sup{ llfX f~ U f lL ; /i, v G V(E),fi ? u}. 
W-v\\tv 

The parameter fi(K) caracterizes mixing properties of the Markov kernel K. Note 
that function /3 is an operator norm, in the sense that (3(Ki.K 2 ) < (3(Ki).(3(K2), for 
any couple of Markov kernels K x , K 2 - By definition, for any measures fi, v G V(E) 
and any Markov kernel K, we have \\fi-K — u.K\\ tv < (3(K). \\fi — u\\ tv . Otherwise, 
for any function / G B(E), 

(1.1) osc(K.f)<f3(K)-osc(f). 

Further details on these ergodic coefficients can be found in the monograph[14j, such 
as the following lemma that we will need hereinafter. 

Lemma 6. Let \i, v G V(E) and G a positive, bounded function on E satisfying 

sup ^jfr < g, for some finite constant g > 0. In this situation, we have 
x,y£E (y> 

\\y G {fi)-* G {p)\\ tv <g. ||// -HL- 
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1.3. Feynman-Kac Models. We recall here some standard tools related to FK 
models. They provide useful theoretical background and notation to formalize and 
analyze IPS methods (see e.g. [20 :> [22J ES] for further details). 

1.3.1. Evolution Equations. Consider a sequence of probability measures (r] n ) n , de- 
fined by an initial measure t]q and recursive relations 

rjn-x (G n x M n .f) 



;i-2) V/£B(4 Vn(f) 



Vn-l(G r 



for positive functions G n G 8(E) and Markov kernels M n with M n (x, •) G V(E) and 
M n (-, A) G 8\(E). This is the sequence of measures we mainly wish to approximate 
with the IPS algorithm. In an equivalent way, (i] n ) n can be defined by the relation 

Vn = 4>n(Vn-l) 

where <f> n : V(E) — > V(E) is the FK transformation associated with potential 
function G n and Markov kernel M n and defined by 

0n(»7n-l) = i>G n (Vn-l)-M n 

with 

i) Gn (Vn-i)(dx) ■= TFT\ G n(x) ri n „i(dx). 

The next formula provides an interpretation of the Boltzmann-Gibbs transformation 
in terms of a nonlinear Markov transport equation 

^G n (Vn-l)(dy) = (Vn-lSn, V n-i) ( d V) '■= J Vn-l (dx)S n;Vn _ 1 (x, dy) 

with the Markov transition S njT)n defined below 

Sn^^x^y) = e n .G n (x) 5 x (dy) + (1 - s n .G n (x)) ^ Gn (r] n ^ 1 )(dy), 
(for any constant e n > so that e n .G n < 1). This implies 

(1-3) T] n = 77n-l-Kn l7 7„_i with K n , 7?n _ 1 = S^^Mn 

Therefore, i] n can be interpreted as the distributions of the random states X n of a 
Markov chain whose Markov transitions 

(1.4) P (X n+1 edy\X n = x) := K n+hVn (x, dy) 

depend on the current distribution r\ n = Law (X n ) . 

We finally recall that the measures rj n admit the following functional representations: 

rin(f) = ^TT 

7n(l) 

(1 stands for the unit function) with the unnormalized FK measures 7 n defined by 
the formulae 

(1-5) 70 = VO ; ln(f) = ln-l(G n X M n .f). 



NON-ASYMPTOTIC ANALYSIS OF ADAPTIVE AND ANNEALED FEYNMAN-KAC PARTICLE MODELS 

Comparing this definition with (II. 2p . it is clear that the normalizing constant 7„(1) 
satisfies 

n 

(1-6) ln{l) = Uvp-i(G p ). 

p=l 

The special interest given to this quantity will be motivated in section 13.11 

1.3.2. Feynman-Kac Semigroup. An important point is that the semigroup trans- 
formations 

4>p,n ■= 4>n O n _l O • • • O (f) p+1 

admit a comparable structure as each of the 4>k- To be more precise, for each integer 
p, let us define the unnormalized integral operator Q p 

(1.7) V/ G 13(E) , Q p .f = G p .M p .f 

and the composition operators Q p>n defined by the backward recursion 

(1-8) Qp,n — Qp+1- [Qp+2 ■ ■ ■ Qn) — Qp+l-Qp+l,n- 

We use the convention Q n ^ n = Id for p = n. Comparing these definitions with (11.51) . 
it is clear that 7 n = 7 n _i.Q n and more generally 

7n 1fp-Q pn . 

for any p < n. The semigroup <ft Ptn can be expressed in terms of Q p , n with the 
following formulae 

for any / G B(E) and fi G V(E). Finally, if we set 

Pp,n'f 7" and Gp^n Qp,n-^- 

typ,n-^ 

then we find that 

/ / \ / r\ ft(Gp,n-Pp,n-f) 

<t>pMU) = — ; (r \ — > 

or in other words: </> p , n (/i) = ijj Gpn (y).P^ n . 

1.4. The Interacting Particle System Model. The central idea is to approxi- 
mate the measures r\ n by simulating an interacting particle system (Cn)n — (C^> ■ ■ ■ > ^n) n 
of size N so that ^ 

^ = N 5 & "►"too Vn- 

l<i<N 

Of course, the main issue is to make precise and to quantify this convergence. The 
particle model is defined as follows. 

We start with N independent samples ( = (Qj, . . . , £q ) from rj . Then, the particle 
dynamics alternates two genetic type transitions. 

During the first step, every particle evolves to a new particle C, l n randomly chosen 
with the distribution 

S n +i, v »(C n ,dx) := e n+1 .G n+1 (C n ) 8&(dx) + (l - e n+1 .G n+1 (C n )) *G n+1 (Vn)( dx ) 
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with the updated measures 



^ , N) = y- G n+1 (Q) 

* G n+ i\il n J / J n ( 

j= 

This transition can be interpreted as an acceptance-rejection scheme with a recycling 
mechanism. In the second step, the selected particles ( l n evolve randomly according 
to the Markov transitions M n+ \. In other words, for any 1 < i < N, we sample a 

random state Cn+i with distribution M n+ \ {c % n -,dx\. 

In view of (II. 4p . if we replace rj^ by r] n , then ( n coincide with iV independent 
copies of the Markov chains X n defined in (1 1.3ft . On the other hand, by the law of 
large numbers, we have rj^ ~ r]o so that 

Vi - Vo - K i^ - Vo-Ki,*k> = Vi- 

Iterating this approximation procedure, the empirical measure rj^ is expected to 
approximate rj n at any time n > 0. As for the unnormalized measures j n , we define 



7^(i)=n<-i(^) 



(mimicking formula (11. 6p ) and more generally 7^ (/) = rj^{f) x TT^^-i(G p ). Let 

P =i 

us mention (see for instance [15]) that these particle models provide an unbiased 
estimate of the unnormalized measures; that is we have that 

VfeB(E), E (->£(/)) = 7n (/). 

In addition to the analysis of rj^'s convergence, the concentration properties of the 
unbiased estimators 7^(1) around their limiting values 7„(1) will also be considered 
thereafter. 

1.5. Some Non- Asymptotic Results. To quantify the FK semigroup stability 
properties, it is convenient to introduce the following parameters. 

Definition 7. For any integers p < n, we set 



b n := (5(M n ) and b p , n := 0(P } 



p,nj 



G n (x) G p>n (x) 
g n := sup and g PtU := sup — — --. 

x,yeE^n{y) x,y£E (Jp,n{y) 

The quantities g PyU , and respectively b Pjn , reflect the oscillations of the potential 
functions G p>n , and respectively the mixing properties of the Markov transition P Pin 
associated with the FK semigroup Pj „ described in HI .3.2)) . Several contraction 
inequalities of w.r.t. the total variation norm or different types of relative 
entropies can be derived in terms of these two quantities (see for instance|14j). 
The performance analysis developed in Sections 12.11 and 13.21 is partly based on the 
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three non asymptotic inequalities presented below. 

Firstly, the following ZAmean error bound for all / G B%(E) is proven in |25| : 



(1.9) 

where B p are the constants introduced in (10.11) 



B,, 



®(K(f)-vn(f)\Y p < 7=£w> 

V ^ k=0 



k.n 



Secondly, the following exponential concentration inequality is derived in [26]. For 

all / G B\(E) and any e > we have: 

(1.10) 



-1 

~N 



logP(k Ar (/)-r ?n (/)|>^ + .) > 



b*0 +^ + e(2r +^ 

nPn Vn \ 3 



where r n , (3 n and 5* are constants so that: 

( r < 4 V o 3 b 

fin < 4 J^ p= o 9p,n^p,n 

b* n < 2 sup 5- Pin 6 Pin 

0<p<n 

Thirdly, the following concentration inequality for unnormalized particle models 7, 
is provided in [22] (see theorem 6.5). Let 



A' 



Xll) 



li§ '.— x 1 — y 2{x + y/x) and 

hi := x i-> — h \/2x. 
3 



Then, Ve G {+1, -1} and > 0: 

where quantities r*, <7^ and f(n) can be estimated this way: 
t* = sup r 9) „, where r 9) „ satisfy 



0<g<n 



n-1 



P=<J 

• (j^ = cr^ I J where a q satisfy a q < 1; 

„— n \ T n J 



9=0 
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• f(n) satisfy 



(1-14) f{n) < - 2^ 9p+x-9l P Kp- 

0<q<p<n 

2. Non-Asymptotic Theorems 

The formulae (II. 9ft . (ll.lOp and (I1.12p provide explicit non asymptotic estimates 
in terms of the quantities g pn and 6 Pj „. Written this way, they hardly apply to 
any IPS parameters tuning decision, since the only known or calculable objects are 
generally the reference Markov chain M p and the elementary potential functions G p . 
We thus have to estimate g p ^ n and b Pt1l with some precision in terms of the g p and 
b p . This task is performed in Section 12.11 In the second section, Section I2.2[ we 



j V . 

combine these estimates with the concentration results presented in Section 11.51 to 
derive some useful uniform estimates w.r.t. the time parameter. 

2.1. Semigroup Estimates. We start with a series of technical lemmas. 

Lemma 8. Let K be a Markov kernel and G a positive function on E satisfying 
G(x) 

sup < g, for some finite constant g. In this situation, we have that 

x,y<EEG{y) 

sup I#t< 1 +/ 3 w^- 1 )- 

XtyeE K.G(y) 

Proof. Let x, y G E be s.t. K.G(x) > K.G(y). Let us write: 

K.Gjx) K.Gjx) - K.Gjy) P(K)(G max - G rmn ) 

K.G(y) " K.G(y) + ~ G mm 

We check the last inequality using the fact that 

K.G(x) - K.G(y) < osc(K.G) < (3(K).osc(G) = (3(K).(G max - G mm ). 

On the other hand we have K.G(y) = j u G(u)K(y, du) > G m i n . The desired result is 
now obtained taking the supremum over all (x, y) G E 2 . Note that ^ K ^ G ^ ax ~ Gmin "> _|_ 
1 is exactly equal to 1 + (3(K)(g — 1). 

This ends the proof of the lemma. □ 



Lemma 9. Let M be a Markov kernel, Q a not necessarily normalized integral 

Q.l(x) 

operator satisfying sup -— < g, for some finite constant g > 1 and f a bounded, 

x,yeEQ-l{y) 

non negative function. In this situation, the Markov kernel P defined by 

j M.Q./M 



M.Q.l(x) 
satisfies the following property 

f3(P)<g.(3(M).f3(P'). 
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Q.fix) 

In the above display formula, P' is the Markov transition defined by P'.f(x) := — . 

Proof. Note that P.f(x) can be written in this way P.f(x) = $q.i(5 x .M) (P'.f). 
Thus, for any x,y G E, we have that 

\P.f(x)-Pf(y)\ = \{* Q . 1 {6,.M)-* Q . l (5 y .M))(P'.f)\ 

< \\^ Q 48 X .M) - 9 Q .i{6 v .M))\\ t0 -osc(P'.f) . 
By Lemma El this implies that 

\\(*Q.i(6 x .M)-* Q . 1 (6 y .M))\\ tl) < g.\\5 x .M-8 y .M\\ tv < g./3(M). 
Using (HI]), we have osc(P'.f) < 0(P').osc{f). 

This ends the proof of the lemma. □ 
Lemma 10. For any integers p < n, we have: 



(2-1) </ P ,n-l< (fl'fc - 



J k-1 



k=p+l 



(2.2) b p , n < \ l b k 

■9k,n 

k=p+l 

Proof. Let us prove (I2.ip . By definition, we have G p>n = Q p>n .l. Combining (II. 7p 
and fll.8p applied to unit function we have 

Qp-l,n(l) = Qp+l- [{Qp+l ■ ■ ■ Qn) -1] = G p x M p . (Qp 7n .i) . 

This implies that the functions G Pi „ satisfy the following "backward" relations: 

G n>n 1 ) Gp—i n Gp x Mp.G p n 

Then, for any x, y G E, we deduce that 

G p - hn (x) _ G p (x) (M p .Gp t n) (x) 
G p -!, n {y) ~ G p (y) * (M p .G pn ) (y) ' 

Ei E 2 

Notice that Ei < g p (by definition), and by Lemmadl we have E 2 < l+(3(M p ).(g p ^ n — 
1). This shows the following backward inequalities: 

(2.3) g n<n = 1 ; g v -\, n < g p (1 + b p (g p>n - 1)) 

We end the proof of (12. ip by induction. 
To prove ( 12. 2p . we use the formulae 

Qp—i,n-f Gp x Mp.Q pn .f M p .Q pn .f 



Pp—l,n-f 



Qp-i,n-l G p x Mp.Qp n .\ Mp.Q pn .\ 



Q n-f 

Recalling that P p , n -f = ~7^r~ — > we a Pply Lemma[9]to check that /3(P p _i jn ) < f3(M p ).gp in .f3(Pp^ n ) 

Qp,n • 1 

from which we conclude that 
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b p — l,n — bp-9p,n-bp,n- 

We end the proof of ( 12. 2 p by induction. 

This ends the proof of the lemma. □ 
We end this section with a useful technical lemma to control the quantity g p , n b p , n - 
Lemma 11. For any p < n, we have 

n 

9p,nbp,n < { [ {bk-9k-l,n)- 
k=p+l 

Proof. Using Lemma fTU| we have 

n 

9p,nhp.n — 9p,n- |£ ^fc^/A^n 
k=p+l 

— 9p,n-{b p+ ig p+ i, n ).(b p+ 29 p +2,n) ■ ■ ■ {b n -l9n-l,n) -\b n 9n,n) 

=1 

n 

= (9 P ,nbp+l ■ \jln— l.nbri) 

k=p+l 

This ends the proof of the lemma. □ 

The term g Ptn b Pyn is central in the L p -mean error bound (II. 9p . By Lemma [TT] we 
have 

n n n 

^ ] [ b k g k -i. n < +oo =>• 22g p , n bp, n < +oo. 

p=0 k=p+l p=0 

This gives a sufficient condition for a uniform LP bound w.r.t. time n. g p - n bp t n is 
also involved in the estimates of all the quantities defined in section [T75l such as r n , 

2 

f3 n , and others. In addition, by Lemma [BJ we have the stability property 

0) - 4>p,n{ u )\\tv < g P ,nbp } n\\f^ v\\f v . 
This shows that the term g p , n bp,n is central to quantify the stability properties of 
the semigroup Pin . 

2.2. Uniform Concentration Theorems. To obtain uniform bounds w.r.t. the 
time horizon, Lemma [TT] naturally leads to a sufficient condition of the following 

type: 

sup bk-gk-i.n < cl for some a G (0, 1) 

k<n 

In this situation, we prove that g Ptn b p ^ n < a n ~ p , and therefore, using (II. 9p . 

SUp E (|^(/) - Vn(f)\ P ) 1/P < ^ 1 



f<=Bi(E) 



N 1-a 
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with the constants B p introduced in (10.11) . We then fix the parameter a £ (0,1) 
and we look for conditions on the b p so that b^gu-i-n < Q>. This parameter a can 
be interpreted as a performance degree of the iV-approximation model. In order to 
explicit relevant and applicable conditions, we study two typical classes of regularity 
conditions on the potential functions G p . The first one relates to bounded coeffi- 
cients g p (Theorem [T2~]) . In the second one, the parameters g p tend to 1 as p — >■ oo 
(Theorem flUj) . 

The concentration inequalities developed in Theorem [T2l will be described in terms 
of the parameters (r*, r£) defined below. 



(2.4) 



9 (M+a) 2 / 8 | 18(M+q) 2 

2 1-a + V vf^- T ^ 



„* _ 1Q (M+a) 2 | / 8 i 18(M+a) 2 



Theorem 12. VKe assume that 

(2.5) sup g p < M and sup 6 P < — 

p>i p>i a + M 

/or some finite M > 1 . In £/m situation we have the following uniform estimates. 
• The IP-error bound: 

(2.6) suprf p (^,^)< 



n>o ^ x ' 2(1 -a)ViV 

• For any n > 0, N > 1, y > and f £ B\(E), the probability of the event 

(2.7) \v^f)-Vn(f)\< r -^^ 

is greater than 1 — e~ y , with the parameters (r*, r£) defined in ( |i?.^| ). 

• For any n > 0, N > 1, e £ {+1, —1}, 2/ > and / £ B\(E), the probability 
of the event 

(2.8) - log < ^h (y) + r 2 .h (-\ 
n V7n(l)/ iV \nJV, 

greater than 1 — e~ y , with the parameters f\ = 8M ( A ^+ ffl ) anfi ( f 2 — iM_ 

Proof. Firstly, we prove the inequalities 



(2.9) 



g P ,n < M + a 

b p .g p -i : n < ct 

Let us assume that b p < 4 for some A > 1. Then, by Lemma fTOj 

Ji« A/f — 1 1 — (l) n -p a 

fc=p+l -4 
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On the one hand, this estimation implies that g„ n < M+a as soon as A > — 

a 

On the other hand, we can write 

1 1 _ M- 1 1 

Op.g p -l,n < -T[g p -l,n - 1) + "J - + T' 



from which we check that b p .g p _ l n < a as soon as A > M+a+ V^ +a ^ _. ^g 



inequalities (12.91) are then proven using the fact that A\ and A2 are both lower than 

M+a 
a 

• By Lemma [TT] and (12. 9p we have g Ptn b P , n < a n ~ p . Combining this with (11.9)) . 
the L p -error bound (12. 6p is clear. 

• Let us prove (12.71) . which is a consequence of the concentration inequality 

2 

(ll.lOp . Combining the estimations of r„, (3 n and 6* given in section [L5l with 
(12. 9p and g p , n b p , n < a n ~ p , we deduce that 



r.<*^±^, S 2 <A and K < 2. 
1 — a 1 — a z 

These estimations applied in (II . 10p lead to 

A% 2 



logP(|^(/)-^(/)|>^ + £ ) > 



r 2 + n(-^ + E) 



with ri = 18 (, M+a ) and = / -^ 7 . We set y = ^4 r > 0. Given some 

y > 0, we have 



e(y) 



2iV 2 r 2 + n( 4» + e(y)) 



I N 

and then it is clear that 

v(\vZ(f)-Vn(f)\>^+e(y)) <e~*. 
After some various but elementary calculations we prove that 

liv+^-^v 2 — 

The last concentration inequality (12.81) is a consequence of (ll.lOp and ( 12. 9p . 
Indeed, from estimations (I1.13P and ( I1.14p . we can easily show that the 
quantities r* and f(n) satisfy 

4M , _. , 8.M(M + a) 2 

t* < — r and r(n) < — . 

n ~ n(l -a) y ' ~ I- a 

On the other hand, <7 2 is trivially bounded by n. Then we find that 
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) -h (y)+T*a 2 n h 1 l 



Finally, (12. 8p is obtained by making the suitable substitutions. 
This ends the proof of the theorem. 



□ 



Let us now consider the case where g p decreases to 1 as p — > oo. The idea of the 
forthcoming analysis is to find a condition on the b p so that the g pXL are uniformly 



bounded w.r.t. n by g]Xi with 



a 



1 — a 



> 



a 



1 + a 



The concentration inequalities developed in Theorem [T2] will be described in terms 
of the parameters r^(n),rl(n) and f^nj^^r^ri) defined below. 



(2.10) 



rl(n) 



9.«i(n) 
2(1- a) 

18.tti (n) 
1-0 



+ 



+ 



g 4- 



18.m(ra) 



+ 



18.ui(n) 



'2.11) 



{ h{n) 



r 5 ln) 



I6.U2 (n) 

1-a 



& V n , n n < 4 -9i 
3 Zjn>0yn+l' U — 3(l-o) 



4\/2.«3(n) 



The sequences «i(n), Ui{n) and 1*3(71) used in the above formulae are defined by 



Ui(n) 



k «3(n) = Epi^+i 



1/2 



Notice that the sequence Ui(n) tends to 1 by dominated convergence. Sequences 
1*2(77.) and 1*3(77) tend to 1 by Cesaro's theorem. 

Theorem 13. We assume that g p i 1 as p — > 00 and the sequence b p satisfies for 
any p > 1 



(2.12) 



and b n < 



In this situation, we have the following uniform estimates. 



18 FRANQOIS GIRAUD AND PIERRE DEL MORAL 

• The L p -error bound 



(2.13) sup d p (riZ,r) n )< \ 

n>o 2(1 — a)yN 



with the constants B p introduced in W.l\) . 

• For any n>0,N>l,y>0 and f G B\(E), the probability of the event 

(2.14) -,„(/), < m^Mi 

is greater than 1 — e~ y , with the parameters r\{n),r\{n) defined in 112. 10\) . 

• For any n > 0, N > 1, e G {+1, —1}, y > and / G B\(E), the probability 
of the event 



e 



7n(l)^ / -/ ^fy + Vy\ i ~ f y \ i ~ / \ / ^ 



(2 - 15) «' OE (tMJ - ?3( " ) (^J +? H^J +f ^Vn.iV 

zs greater than 1 — e~ y , wif/i i/ie parameters f 3 (n), f 4 , f 5 (n) defined in Ii2.11\) . 
Proof. Firsly, we prove that 



(2.16) ( 12~T2D Vj9<n, 



S'pjn ^ (fl'p - - ■ ' 

9p-i,n-b p < a 



The proof of the first inequality comes from a simple backward induction on p (with 
fixed n), using formula g p -i, n < flp (1 + b p (g p , n ~ 1)) ( see (E3J))- For p = n, g p ^ n is 
clearly smaller than because g ni „ = 1. The second assertion is now immediate. 
Now we assume that g p ^ n < g p X\- in this case, g p -\^ n < gp +a is met as soon as 

q a - 1 



Notice that this estimate is met as soon as b p < ^ a p +1 - , the sequence (g p ) p being 
decreasing. 

• Now that we proved (I2.16P (which implies g p , n b p , n < a n ~ p by Lemma [TT]) . the 
ZAmean error bound (12. 6p comes from a simple substitution in (11. 9p . 

• To prove (12. 14ft . we focus on the quantities j3 n , o* and r n arising in the 
concentration inequality (ll.lOp . With (I2.16P and g p , n b p ,n < a n ~ p , we readily 
verify that 

K 2 <r^ and b*<2. 
1 — a z 

The term r n can be roughly bounded by jz^g^ 1+a \ but another manipulation 
provides a more precise estimate. Indeed, using the fact that b p ^ n .g p ^ n < a n ~ p 
and g p ^ n < g p +i, we prove that 
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p=0 



< 



2(l+a) Q n-p 



p=0 



< 



P>0 



1 — a 



We prove (12. 14ft using the same line of arguments as in the proof of Theorem 
H 

• Let us prove the last concentration inequality (j2.15p . It is mainly a conse- 
quence of the inequality (I1.12p . Starting from the following decomposition 



we need to find some refined estimates of the quantities r*, f n and (t*.<t^) 
To estimate r*, we notice that Vg, <7 p+ i < g p ~q+i, so that 



. n— 1 . n— 1 , n—q—1 

p=g p=g p=0 

. n— 1 

4 



p=0 



Finally, we have that r* < -, where Uq = '^^g p+ ia p < 

We estimate f n , using the following inequalities: 



a 

p>0 



n—l p 



0<q<p<n p=0 q=p—n 



< 8 1A : 
1 - a ' n ^ 9l 




=U2{n) 



Let us conduct a last useful estimation: 
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2 

n— 1 n— 1 / ^ n— q+1 x ~ 

q=0 q=0 \ p=0 

^16 2 1 
ra 2 (l-a) 2 

9=0 V y 



16 

< ; ~~r X 



^ n— 1 



n.(l — a) 2 n 



--(u 3 (n)) 2 



At last, we make the suitable substitutions in (12.171) and obtain the desired 
inequality ()2.15j) . 

This ends the proof of the theorem. □ 



3. Interacting Simulated Annealing Models 

3.1. Some Motivations. We consider the Boltzmann-Gibbs probability measure 
associated with an inverse "temperature" parameter (3 > and a given potential 
function V G B(E) defined by 



(3.1) ti P (dx) = —e- p ' v ^m{dx), 

where m stands for some reference measure, and Zp is a normalizing constant. We 
let n a strictly increasing sequence (which may tend to infinity as n — > oo). In 
this case, the measures r] n = \ip n can be interpreted as a FK flow of measures with 
potential functions G n = e~^ n ~^ 1 ' v and Markov transitions M n chosen as being 
MCMC dynamics for the current target distributions. Indeed, we have 

-Pn.V{x) ? f -P n -i.V(x) 

^ n (dx) = — m{dx) = J^±p-Wn-Pn-i)v<.*l m (dx] 



=G„(x) 



Zp n -, 



This shows /J,p n = ipG n {^l3 n -i)- Let n stand for the FK transformation associated 
with potential function G n and Markov transition M n . We have 



Sampling from these distributions is a challenging problem in many application 
domains. The simplest one is to sample from a complex posterior distribution on 
some Euclidian space E = R d , for some d > 1. Let a: be a variable of interest, associ- 
ated with a prior density p(x) (easy to sample) with respect to Lebesgue measure dx 
on E, and y a vector of observations, associated with a calculable likelihood model 
p(y | x). In this context, we recall that p(y \ x) is the density of the observations 
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given the variable of interest. The density p(x \ y) of the posterior distribution rj is 
given by Bayes' formula 



In the case where r] is highly multimodal, it is difficult to sample from it directly. 
As an example, classic MCMC methods tend to get stuck in local modes for very 
long times. As a result, they converge to their equilibrium 77 only on unpractical 
time-scales. To overcome this problem, a common solution is to approximate the 
target distribution rj with a sequence of measures r) , . . . ,T] n with density 



where (Pk)o<k<n is a sequence of number increasing from to 1, so that 7] is the prior 
distribution of density p(x), easy to sample, and the terminal measure rj n is the target 
distribution i] (see for instance EIU ESI EI])- ^ we take V := x M- — \og(p(y \ x)) 
and m(dx) := p(x)dx, then the % coincide with the Boltzmann-Gibbs measures \xp k 
defined in (13.11) . In this context, IPS methods arise as being a relevant approach, 
especially if rj is multimodal, since the use of a large number of particles allows to 
cover several modes simultaneously. 

The normalizing constant 7 n (l) coincide with the marginal likelihood p(y). Com- 
puting this constant is another central problem in model selection arising in hidden 
Markov chain problems and Bayesian statistics. 

Next, we present another important application in physics and chemistry, known 
as free energy estimation. The problem starts with an un-normalized density of the 
form 



where H(uj, a) is the energy function of state u, k is Boltzmann's constant, T is the 
temperature and a is a vector of system characteristics. The free energy F of the 
system is defined by the quantity 



where z(T, a) is the normalizing constant of the system density. See for instance 
[7J [9j [29] for a further discussion on these ground state energy estimation problems. 

Last, but not least, it is well known that Boltzmann-Gibbs measures' sampling is 
related to the problem of minimizing the potential function V. The central idea is 
that yLp tends to concentrate on V's minimizers as the inverse temperature /3 tends 
to infinity. To be more precise, we provide an exponential concentration inequality 
in Lemma [HI 

In this context, the IPS algorithm can be interpreted as a sequence of interacting 
simulated annealing (abbreviate ISA) algorithms. As they involve a population of N 
individuals, evolving according to genetic type processes (selection, mutation), ISA 
methods also belong to the rather huge class of evolutionary algorithms for global 
optimization. These algorithms consist in exploring a state space with a population 
associated with an evolution strategy, i.e. an evolution based on selection, mutation 



p(x I y) oc p(x).p(y \ x). 



rjk(dx) oc p(x).p(y \ xy h dx 




F(T,a) 



k.T. log(z(T, a)) 
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and crossover. See [31J or [35J and the references therein for an overview. As these 
algorithms involve complex, possibly adaptive strategies, their analysis is essentially 
heuristic, or sometimes asymptotic (see [8] for general convergence results on genetic 
algorithms). The reader will also find in [24j a proof for a ISA method of the a.s. 
convergence to the global minimum in the case of a finite state space, when the time 
n tends to infinity, and as soon as the population size N is larger than a critical 
constant that depends on the oscillations of the potential fitness functions and the 
mixing properties of the mutation transitions. 

The results of the previous sections apply to the analysis of ISA optimization 
methods. Our approach is non-asymptotic since we estimate at each time n, and 
for a fixed population size N the distance between the theoretical Boltzmann-Gibbs 
measure rj n and its empirical approximation r)^ . In some sense, as /3 n — > +00, rj n 
tends to the Dirac measure 5 X * where x* = argmin V(x), as soon as there is a single 

global minimum. So intuitively, we guess that if this distance admits a uniform 

bound w.r.t. time n, then for large time horizon we have 77^ ~ 5 X *. 

In this section we propose to turn the conditions of Theorem [T2] and Theorem [T3] 

into conditions on the temperature schedule to use, and the number of MCMC steps 

that ensure a given performance degree. Then we combine the concentration results 

of section 12.21 with Lemma [TH to analyze the convergence of the IPS optimization 

algorithm. 

3.2. An ISA Optimization Model. We fix an inverse temperature schedule (3 n 
and we set 

• i] n (dx) = fi/3 n (dx) = -J^-e"^ nV ^m(dx); 

fin 

• G n {x) = e- A » y W ; 

• and then g n = e A «- osc ( v ') ) 

where A n are the increments of temperature A n = (3 n — @ n -\. We let Kp the 
simulated annealing Markov transition with invariant measure fip and a proposition 
kernel K(x,dy) reversible w.r.t. m(dx). We recall that Kp(x,dy) is given by the 
following formulae: 

K p (x, dy) = K(x, dy). min (l, e ~^ v ^~ v ^) \/y ^ x 

K p (x, {x}) = 1 - \ y+x K(x, dy). min (l, e -W<»)-^))) 

Under the assumption K °{x, •) > Su(-) for any x with some integer k > 1, some 
measure v and some 5 > 0, one can show (see for instance |3]) that 

(3.2) (3(K k /) <(l- Se~^ k0 ^ 

where AV(ko) is the maximum potential gap one can obtain making ko elementary 
moves with the Markov transition K. One way to control the mixing properties 
of the ISA model is to consider the Markov transition M p = K^°' mp , the simu- 
lated annealing kernel iterated k .m p times. In this case, the user has a choice to 
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make on two tuning parameters, namely the temperature schedule fl p , and the it- 
eration numbers m p . Note that for all b G (0,1), condition b p < b is turned into 

/ \ ra p 

( 1 — Se~^ pAV ^ J < b, which can also be rewritten as follows. 

l og (l) e AF(*o)A 

m p > — - 

We prove now is a technical lemma that we will use in the following. It deals with 
Boltzmann-Gibbs measures' concentration properties. 

Lemma 14. For any (3 > 0, and for all < s' < e, the Boltzmann-Gibbs measure 
fj,p satisfies 

e -P(e-e') 

fJifi (V > V min + e)< , 

where m E i — m(V < V m i n + e') > 0. 

Proof. The normalizing constant Zp of the definition (13. ip is necessary equal to 
e~P y dm. Then we have 



E 



e /3V dm 
v>y min + £ 



e @ v dm + J e $ v dm 

V>V min +E V<V min +e 



< 




Firstly, it is clear that A 1 < e l3( - v ^ +e \ Secondly, e' < e implies {V < V min + e'} C 
{V < V min + e}, then we have 



12 

We end the proof by making the appropriate substitutions. □ 

Combining this Lemma [HJ the theorems of section 12.21 (with indicator test func- 
tion / = l{y>y min+e }) , and the Dobrushin ergodic coefficient estimate (13. 2ft we prove 
the following theorem: 

Theorem 15. Let us fix a E (0, 1). For any e > 0, n > and N > 1, let p% (e) 
denote the proportion of particles s.t. V(££) > V m m + £• We assume that the 
inverse temperature schedule (3 p and the iteration numbers m p satisfy one of these 
two conditions: 
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(1) sup A p < A < oo (e.g. linear temperature schedule) and 

log ^^C)+ n j c A?(h).g„ 

m p > 2 — . 

o 

/ 1 \ e AV(k ).l3 p 

(2) A p I (as p -> ooj and m p > I osc(y).A p + log(-) J . 

In this situation, for any e > 0, n > 0, N > 1, y > 0, and e' G (0, e), £/ie probability 
of the event 

e -Me-e') r *N + r*y 
K(e) < + — -J- 

is greater than 1 — e~ y , with = (1, 2) (and M = e A ) in the case of bounded A p , 
and = (3,4) in the second one. 

/ e -f3 n (e-e')\ 

We distinguish two error terms. The first one, I I , is related to the 

V "V / 

concentration of the Boltzmann-Gibbs measure around the set of global minima of 

/ r * jV _|_ r *y \ 

V. The second one, I ^ 2 j , is related to the concentration of the occupa- 
tion measure around the limiting Boltzmann-Gibbs measure. Besides the fact that 
Theorem [15] provides tuning strategies which ensure the performance of the ISA 
model, the last concentration inequality explicits the relative importance of other 
parameters, including the probabilistic precision y, the threshold t on the propor- 
tion of particles possibly out of the area of interest, the final temperature (3 n and 
the population size N. A simple equation, deduced from this last theorem, such 
/ e -/? n (W) r*N + r*y f\ 

as I = — - = - may be applied to the global tuning of an ISA 



77v N 2 2, 

model, which is generally a difficult task. 

One natural way to choose A p is to look at the number of iterations n\ we need 
to proceed to move from j3 p to (3 n , and to compare it to 772, the iteration number we 
need to proceed to move from (3 P to (3 q , and then from j3 q to (3 n , with (3 P < (3 q < (3 n . 
Roughly speaking, we have seen that the convergence condition was b p < j-, then 

. 77 X ~ (0Sc(V).A p , n + log(i)) _ 

. 77 2 * (osc(V).A p , q + log©) + (osc(V).A q , n + log©) 

where A Pi9 := (3 p — (3 q for p > q. After some approximation technique we find that 

77i < no •<=>■ A D0 A „ > — — . 

P ' q q ' ~ osc(V).AV(k ) 

This condition doesn't bring any relevant information in the case where A p — > 
exept that the error decomposition rj^ — r) n = J2p=o fip^iVp) ~ 4 ) p,n(fip(Vp-i) > underly- 
ing our analysis, is not adapted to the case where A p — > (which can be compared 
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to the continuous time case). Nevertheless, this condition is interesting in the case 
of constant inverse temperature steps. In this situation, the critical parameter A^ 
is given by 



More precisely, above the algorithm needs to run too many MCMC steps to 
stabilize the system. In the reverse angle, when the variation of temperature is too 
small, it is difficult to reach the disired target measure. 



As we already mentioned in the introduction, the theoretical tuning strategies 
developed in section 13.21 are of the same order as the logarithmic cooling schedule of 
traditional simulated annealing ( |3j EJ |2l] ) . In contrast to SA models, we emphasize 
that the performance of the ISA models are not based on a critical initial tempera- 
ture parameter. Another advantage of the ISA algorithm is to provide at any time 
step an N- approximation of the target measure with a given temperature. In other 
words, the population distribution reflects the probability mass distribution of the 
Boltzmann-Gibbs measure at that time. Computationally speaking, the change of 
temperature parameter A p plays an important role. For instance, if A p is taken too 
large, the selection process is dominated by a minority of well fitted particles and the 
vast majority of the particles are killed. The particle set's diversity, which is one of 
the main advantage of the ISA method, is then lost. On the contrary, if A p is taken 
too small, the algorithm doesn't proceed to an appropriate selection. It wastes time 
by sampling from MCMC dynamics while the set of particles has already reached 
its equilibrium. The crucial point is to find a relevant balance between maintaining 
diversity and avoiding useless MCMC operations. 

Designing such a balance in advance is almost as hard as knowing the func- 
tion V in advance. Therefore, it is natural to implement adaptive strategies that 
depend on the variability and the adaptation of the population particles (see for 
instance |3"3j 13"%] . |131 ¥ZT\ |3"E] for related applications). In the general field of evo- 
lutionary algorithms, elaborating adaptive selection strategies is a crucial question 
(see, e.g. [3]) and a challenging problem to design performant algorithms. In the 
case of ISA methods, the common ways to choose A p are based on simple criteria 
such as the expected number of particle killed (see section PO]) . or the variance of the 
weights (Effective Sample Size). All of these criteria are based on the same intuitive 
idea; that is to achieve a reasonable selection. As a result, all of these adaptive ISA 
models tend to perform similarly. 

In |17| . the reader will find a general formalization of adaptive IPS algorithms. 
The idea is to define the adaptation as the choice of the times n at which the resam- 
pling occurs. These times are chosen according to some adaptive criteria, depending 
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on the current particle set, or more generally on the past process. Under weak condi- 
tions on the criteria, it is shown how the adaptive process asymptotically converges 
to a static process involving deterministic interaction times when the population 
size tends to infinity. A functional central limit theorem is then obtained for a large 
class of adaptive IPS algorithms. 

The approach developed in the following section is radically different, with a special 
focus on non-asymptotic convergence results for the ISA algorithm defined in section 
14.21 The adaptation consists here in choosing the (3 increment 1 so that 



where e > is a given constant, at each iteration n. We show that the associated 
stochastic process can be interpretated as a perturbation of the limiting FK flow. 

4.1. Feynman-Kac Representation. Let V £ B(E). To simplify the analysis, 
without any loss of generality, we assume V m m = 0. Let us fix e > 0. For any 
measure ji G V(E), we let the function A^ defined by 

[0,+oo) — ► (0,1] 
A /t = x ^ fi (e" x ' y ) 

This function is clearly decreasing (A M (0) = 1), convex, and different iable infin- 
itely. Moreover, if /x ({V = 0}) = 0, then it satisfies X^x) — > when x — > +oo. 
Therefore, we can define its inverse function n^. 

(0,1] — ► [0,+oo) 
Kfj, = e i — v x so that /j (e~ z ' y ) = e 

This function is again decreasing, convex, infinitely differentiable, takes value for 
e — 1, and it satisfies ^(e) — > +oo when e — > + . 

Now, we let m be a reference measure on E s.t. m ({V = 0}) = 0. We consider 
the sequence (/3„)„ and its associated Gibbs measures rj n = [ip n oc e - ^ nV \m, defined 
recursively by the equation 

(4.1) A n+ i := (f3 n+1 - n ) = K Vn (e). 

In an equivalent way, we have 

X Vn {A n+l )=e or r^e^"^) = e. 

The main objective of this section is to approximate these target measures. For- 
mally speaking, (j] n ) admits the FK structure described in section [XT] with potential 
functions G n (x) = e ~^-V{x) ^ an( j some dedicated MCMC Markov kernels M n . We 
let g n , b n , be the associated oscillations, Dobrushin ergodic coefficients and the cor- 
responding FK transformations <\> n . 
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The solving of the equation rj n (e~ An+1 ' v ) = e can be interpreted as a way to im- 
pose some kind of theoretical regularity in the FK flow. Indeed, according to the 
formula fll.6p (and definition (13. ip ), it is equivalent to find A n+1 s.t. 

7n+l(l) = ( Z{3 n+An+1 \ 
7.(1) V J' 

In other words, the sequence A n is defined so that the normalizing constants 7 n (l) 
increase geometrically, with the ratio 7„ +1 (l)/7„(l) = e. Notice that these incre- 
ments A n are only theoretical, and the corresponding potential functions G n are not 
explicitly known. 

4.2. An Adaptive Interacting Particle Model. As in the classic IPS algorithm, 
we approximate the measures 7] n by simulating an interacting particle system (( n ) n = 
(Cn, ■ ■ ■ , Cn) n o f size N so that 

l<i<N 

We start with A" independent samples from r)o and then alternate selection and 
mutation steps, as described in section 11.41 As we mentioned above, in contrast 
to the classic IPS model, the potential function G n+ \ arising in the selection is not 
known. The selection step then starts by calculating the empirical increment A^ +1 
defined by 



A n+ i := k v n(s) 

or A^(A^ +1 ) = ^{e-^rV) = £ . As the quantity ^{e^ v ) = i £ 

l<i<N 

is easy to calculate for all A > 0, one can calculate A^ +1 by, e.g., performing a 
dichotomy algorithm. If we consider the stochastic potential functions 

then every particle evolves to a new particle C, l n randomly chosen with the following 
stochastic selection transition 

Sn + i^(Cn,dx) := C + i(C) S ck (dx) + (1 - C + i(C)) * G »JvZ)(dx). 

In the above display formula, *&g n ^Vn) stands for the updated measure defined by 

j=l l~ik=l ^n+lKSnJ 

Note that V m - m = ensures < G^ +1 < 1. The mutation step consists in performing 

-TV (H \ 



N r N (t3\ 



—& r i . 



Markov transitions M^ +l (Q n1 •), defined as M n+1 ((^, •) by replacing /3 n+ i by f3^ +1 
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13^ + ^n+v Thus, conditionally to the previous particle set ( n , the new population 
of particles Cn+l is sampled from distribution 

Law(c +1 ,...,e +1 ic-,e) 

(4-2) = (ta-Sn+i^-M^i) ® • • • ® . 

The definition of A^ +1 is to be interpreted as the natural approximation of the 
theoretical relation (14. ip . On the other hand, it admits a purely algorithmic in- 
terpretation. As a matter of fact, conditionnaly to the n-th generation of particles 
(Cn> ■ ■ ■ j Cn) > the probability for any particle Q n to be accepted, i.e. not affected by 
the recycling mechanism, is given by G^ +l {C, l n ) = e~ A ^+ 1 ' v ^"\ Then, the expecta- 
tion of the number of accepted particles is given by £\ e~ A «+ 1,y ^™). But it turns out 
that this quantity is exactly iV x rj^ (e~ A ™+ 1 ' v ), which is equal to N.e by definition of 
A^ +1 . Therefore, e is an approximation of the proportion of particles which remain 
in place during the selection step. In other words, at each generation n, the incre- 
ment A^ +1 is chosen so that the selection step kills less than (1 — e).N particles. 
This type of tuning parameter is very important in practice to avoid degenerate 
behaviours. 

4.3. A Perturbation Analysis. This section is mainly concerned with the conver- 
gence analysis of a simplified adaptive model. More precisely, we only consider the 
situation where the mutation transition in (14.21) is given by the limiting transition 
M n+1 . The analysis of the adaptive model ( 14. 2 p is much more involved, and our 
approach doesn't apply directly to study the convergence of this model. 
Despite the adaptation, the sequence r]^ can be analyzed as a random perturbation 
of the theoretical sequence r] n . Let us fix n and a population state ( n at time n. 
If 4>n+i denotes the FK transformation associated with potential G„ +1 and kernel 
M n+ i, then, by construction, the measure T]^ +1 is close to <ftn+i(Vn)- ^ n particular, 
by the Khintchine's type inequalities presented in [T3] we have 



(4.3) yfeB 1 (E), E(|^ +1 (/)-^ +1 (^)(/)| p |Cn) 1/P < 




with the constants B p introduced in (10.11) . A simple, but important remark about 
the Boltzmann-Gibbs transformations is that for any measure fi and any positive 
functions G and G we have 

qN 

Therefore, if we take H^ +l := " +1 , then the perturbed transformation can 

G n +i 

be written in terms of the theoretical one <fi n +i by 



(4.4) 



(t> n+ l = O lp H N 
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If we use an inductive approach, we face the following problem. Let rj be a deter- 
ministic measure (j] n in our analysis) and fj a random measure (rj^ in our analysis), 
close to 77 under the d p distance (induction hypothesis). We also consider a Markov 
kernel M and the potential functions 

(4.5) G = e~ K ^ £) - v , G = e -^ £) - v , H = % 

G 

and we let <fi (respectively <p) be the FK transformation associated with the po- 
tential function G (respectively G). The question is now: how can we estimate 
d p (4>(rj) , <p(fj)) in terms of d p (rj, fj) ? 

To answer to this question, we propose to achieve a two-step estimation. Firstly we 
estimate the distance between fj and i^fj{fj) (Lemma fT6|) . Secondly we analyze the 
stability properties of the transformation (Lemma fT7|) . This strategy is summer- 
ized by the following synthetic diagram. 

V ► <i>(v) 

fj 

Mv) > Hv) 

Lemma 16. Let r\ G V{E), fj G Vu(E) and let G, G, H be the positive functions 
on E defined by equations fl^.5| ). If i]({V = 0}) = f]({V = 0}) = (a.s.), then for 
all e > 0, we have 

dpi^iiiv),!) < ^ d p (f),r)) . 

Proof. We simplify the notation and we set x := K v (e) and 
We start with the following observation, 



(4.6) Mv)(f)-V(f) 



V(HJ) 
fj(H) 



H-fj(H)).f 



A 2 



for any / G B(E). We notice that H = G/G = e^ x ~ x ^ v ', which leads to the 
lower bound fj(H) = fj (e^ x ~ x >' v ) > fj (e~ x ' v ) = e. The last equality comes from 
the definition of x. We just proved: \A\\ < e^ 1 . On the other hand, we have 
osc(H) = \e^ x ~^' VmBX — l|, so that 



M<v(\&-v(H)\)'\\f\L-£°*<H) 



(4.7) 



< e 



(x-x)-V m 
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The quantity u := (J~ x ~ x )' Vm ^^ — \\ is intuitively small. Next, we provide an estimate 
in terms of the functions A^ and A^. Given u> e Q, if x > x, then we can write 



(\fj(x) - Xj(x^ ) = (Xf)(x) - Xfj(x)) = J -X fl (s)ds. 

=e=\f,(x) 

Furthermore, for all fi G V(E) and s > 0, we have 

-X'^(s) =fjt(V- e~ sV ) > /i {V ■ e - sV —) 

> fx(V)-e- sV ^. 

Then, we have 

(A^)-A,(x))>77(y)-^-[ e -^-]^ 

^max 

p ^Vmax 

= fj{V) (e(z-*)v m ax _ ^ 



g J- v max 

77(F)— u. 

V m 



max 



By symmetry, we have 

x< x =>• (A 7? (x) - A^(x)) > r/<y) — (-«). 

"max 

This yields the almost sure upper bound 

p ^Vmax 

l«l ■ V(V)-T7 < W(x) - Xfj{x)\ . 

^max 

Using the decomposition fj(V) = T](V) + {f)(V) —rj(V)), by simple manipulation, we 
prove that 

H < „ (y) |A,(x)-A r? (x)| + — 

A 3 v «, ' As 

Considering the L p norm of the right hand side of this inequality, one can check that 

• A 3 = (77 -77) (e" z ' y ) so, as osc (e" a " y ) < 1, \\A^\\ p < ^(77,77); 

• as osc(V) = V max , ||v4 4 || p < ^ • d p (r],r]); 

Making the appropriate substitutions, we have 

11 /s 11 ^ ''"max 1 - 7 / ~ \ 

\\u\\ P < ^ d p {r],v)- 

Combining this result with (14.61) and (14. Tj) , we check that 

II WW/) - 3(/)ll p < f^yS ■ d p (V,V) 
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We finally go from WfW^ to ^ j by noticing that ipft(ff)(f) — v(f) 1S equal to 
for any constant function /, and by considering the above inequality taken for 
f = f - fss£±imin ; which satisfies / = 2£^£i. 

oo 

This ends the proof of the lemma. □ 



Lemma 17. Let 77 G V(E), fj G Vq(E), and let be a FK transformation associated 
with a positive function G and a Markov kernel M . If we set g := sup G(x)/G(y) 

x,y£E 

and b := f3(M), then we have 

^ 00?), 00?)) < 9 -b - d p (fj,T]) . 
Proof. Let us fix / G 13(E). We have 

<KW) - <Kv)(J) = v{G ^' f) - <Ki)<J) 

y(Gx[M.(f-<j>( V )(f))]) 
fj (G) 

Let / = M.(f -<t>{ri){f)). By property (Oil . / satisfies osc(f) = osc(M.f) < 
b-osc(f). 

Additionally we have t?(G x /) = rj(G x M.f)-rj (g x = 0. So we obtain 



m(f)-<Kv){f) 



r){G x /) ^(G x /) 



77(G) 77(G) 



(v-v)(Gxf) 



Firstly, we notice that 
can be rewritten as 



G, 



77(G) 



< 



G r 



Gn 



= «G) ( "-" )( G^ X/) - 
< (7. On the other hand, we notice that / 



/ = M. (f - ip G (r])(M.f)) = (M.f) - Mv)(M.f). 

It follows that / max > and / min < 0. Under these conditions, osc (^ G G x / ) < 
osc(f ) < b ■ osc(f). We conclude that 



E 



Gr, 



■(7}-77) 



G 



Gr, 



x/ 



77(G) 

This ends the proof of the lemma 



p\ i/p 



<g-E 



E 



(77-77) 



G 



G u 



x/ 



p-i i/p 



<g-b-dp(rj,r})-osc(f). 



□ 
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4.4. Non- Asymptotic Convergence Results. This section is mainly concerned 
with the proof of Theorem [3] stated on page [6j We also deduce some concentration 
inequalities of the ISA adaptive model. We start with the proof of Theorem [3j 

Proof of Theorem 0- 

We fix p > 1 and we let e n = Y^k=o l~ir=fc+i ^i9ii^ + c «)- We notice that this sequence 
can also be defined with the recurrence relation e n+ i = 1 + g n+ ib n +i(l + c n+ i) • e n 
starting at e?o = 1. We also consider the following parameter. 



e " := 7T^~' SU P \\Vn(f) -Vn(f)L 

za p feOi(E) 

We use an inductive proof to check that the proposition IH(n) = {e n < e n } is met 
at any rank n. As T]q is obtained with N independant samples from t]q, IH(0) is 
given by the Khintchine's inequality. Now suppose that IH(n) is satisfied. 
According to the identity (14. 4p . we can write the following decomposition. 

(4-8) = W +1 - ^ +1 (Vn)) + Un+1 Uh^JVu)) ~ MVn)) 

" A ' " * ' 

Given ( n , and using (14. 3p . we know that for all function / G Oi(E), we have 

— — ||Al(/)|L < 1. To estimate A 2 , we start by decomposing (ip H N (rj^) — i] n 
2r> p p V n+1 

in this way: 

~Vn= (V" +1 (^) ~ ) + W - Vn) ■ 



By the induction hypothesis, we have j^- ■ sup ||Q2(/)|L — e n . Therefore, by 

P feOi(E) 

Lemma fT6| we find that 



N 

sup \\Qi(f)\L < c n+1 ■ e n . 



2B p / e oi(B) 



Thus the measures "4>H N +1 (Vn) e ^ Vn satisfy 
x/N 



sup 



2-Bp f&o x (E) 
Applying Lemma [T7] we also have 



TP H » + M)(f)-Vn(f) <(l + C n+ l) 



N 

sup ||A 2 (/)|| p < g n+1 b n+1 (l + c n+ i) • e n . 



2B p fediE) 

Back to (14. 8p . we conclude that e n+ i < 1 + g n +i&n+i(l + c n +i) • e n = e n+ \. 

This ends the proof of the theorem. □ 
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We are now in position to obtain a sufficient condition for uniform concentration 
and L p -mean error bounds w.r.t. time. 

Corollary 18. If the condition b n g n (l + c n ) < a is satisfied for some a < 1 and any 
n, then we have the uniform error bounds 



(4.9) 



dp {Vn,Vn) < 



B„ 



2(1- aWN 



for any p, with the constants B p introduced in W.l\) . In addition, for any f £ B\(E), 
we have the following concentration inequalities: 



(4.10) 
and 



Vs>0, P(|^(/)-77„(/)|>s)<r 1 (VW. S ) 



(4-11) V// > 1. 7[\v%(f)-Vn(f)\> 

with the parameters 



n 

r 



N 



< e~ y , 



e^il - a) 
2 



1 - a 



Proof. The inequality (14. 9 j) is a direct consequence of Theorem [3j 
Let us fix n, f £ B\(E) and set 



X:=|7^(/)-77 ft (/)| and e w := 



[l-a)VN' 



In this notation, we have ||X|| < B p ■ for any p > 1. Let us fix s > 0. By Markov 
inequality, for all t > we have 



(4.12) P (X > s) = P (e tx > e ts ) < e~ st E (e tx ) . 

Using ( 10. ip . we estimate the Laplace transform E (e <x ). 



p>0 ^ 



tKXP 



pi 



t 2 ^(2p)\ ^^ P+ i e ^+i ( 2p+ i)! 



< 



2p+l 



^ (2p)! 2f.p! ^ (2p + 1)! 2P.p! v ^+T 



< (1 + teAr)e~ 



2,2 
JV 
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S 
















s 


i 




< — 


2 

e 











which is equivalent to the first concentration inequality f)4.10p . 
For the second one, we use the inequality 



with u = j-. Then, one can easily show that for all y > 1 and u > 2, the equation 
y=^£-u + lis equivalent to u = 1 + a/1 + 4(y - 1) < 2(1 + v /y). Thus > 1: 



¥(X>2e N (l + ^))<e-y, 

which is equivalent to (14. lip . 

This ends the proof of the corollary. 



□ 
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